regularization constant
Causal Regularization
We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models. We first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. The reason is a close analogy between overfitting and confounding observed for our toy model. In the case of overfitting, we can choose regularization constants via cross validation, but here we choose the regularization constant by first estimating the strength of confounding, which yielded reasonable results for simulated and real data. Further, we show a'causal generalization bound' which states (subject to our particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class.
Reviews: Fast Sparse Group Lasso
Summary: This paper presents a fast block coordinate descent algorithm for the sparse-group lasso problem. Two strategies are proposed to improve the computational efficiency. The first strategy is quickly identifying the groups of inactive features. The idea is to use an easy-to-compute upper bound when checking if the inactive-group condition holds. The second strategy is to select a set of candidate groups and update the feature vectors inside those groups first before iterating over all groups.
Causal Regularization
We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models. We first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. The reason is a close analogy between overfitting and confounding observed for our toy model. In the case of overfitting, we can choose regularization constants via cross validation, but here we choose the regularization constant by first estimating the strength of confounding, which yielded reasonable results for simulated and real data. Further, we show a'causal generalization bound' which states (subject to our particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class.
Convex Tensor Decomposition via Structured Schatten Norm Regularization
We study a new class of structured Schatten norms for tensors that includes two recently proposed norms ("overlapped" and "latent") for convex-optimizationbased tensor decomposition. We analyze the performance of "latent" approach for tensor decomposition, which was empirically found to perform better than the "overlapped" approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific unknown mode, this approach performs as well as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structured Schatten norms, which is also interesting in the general context of structured sparsity. We confirm through numerical simulations that our theory can precisely predict the scaling behaviour of the mean squared error.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
Optimal Training of Mean Variance Estimation Neural Networks
Sluijterman, Laurens, Cator, Eric, Heskes, Tom
This paper focusses on the optimal implementation of a Mean Variance Estimation network (MVE network) (Nix and Weigend, 1994). This type of network is often used as a building block for uncertainty estimation methods in a regression setting, for instance Concrete dropout (Gal et al., 2017) and Deep Ensembles (Lakshminarayanan et al., 2017). Specifically, an MVE network assumes that the data is produced from a normal distribution with a mean function and variance function. The MVE network outputs a mean and variance estimate and optimizes the network parameters by minimizing the negative loglikelihood. In our paper, we present two significant insights. Firstly, the convergence difficulties reported in recent work can be relatively easily prevented by following the simple yet often overlooked recommendation from the original authors that a warm-up period should be used. During this period, only the mean is optimized with a fixed variance. We demonstrate the effectiveness of this step through experimentation, highlighting that it should be standard practice. As a sidenote, we examine whether, after the warm-up, it is beneficial to fix the mean while optimizing the variance or to optimize both simultaneously. Here, we do not observe a substantial difference. Secondly, we introduce a novel improvement of the MVE network: separate regularization of the mean and the variance estimate. We demonstrate, both on toy examples and on a number of benchmark UCI regression data sets, that following the original recommendations and the novel separate regularization can lead to significant improvements.
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > Gelderland > Nijmegen (0.04)
Causal Regularization
We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models. We first consider a multi-dimensional variable linearly influencing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. The reason is a close analogy between overfitting and confounding observed for our toy model. In the case of overfitting, we can choose regularization constants via cross validation, but here we choose the regularization constant by first estimating the strength of confounding, which yielded reasonable results for simulated and real data. Further, we show a'causal generalization bound' which states (subject to our particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class.
Qini-based Uplift Regression
Belbahri, Mouloud, Murua, Alejandro, Gandouet, Olivier, Nia, Vahid Partovi
This article proposes methodology that identifies characteristics associated with a home insurance policy that can be used to infer the link between marketing intervention and policy renewal rate. Using the resulting statistical model, the goal is to predict which customers the company should focus on, in order to deploy future retention campaigns. A subscription-based company loses its customers when they stop doing business with their service. Also known as customer attrition, customer churn can be a drag on the business growth. It is less expensive to retain existing customers than to acquire new customers, so businesses put effort into marketing strategies to reduce customer attrition. Customer loyalty, on the other hand, is usually more profitable because the company have already earned the trust and loyalty of existing customers. Businesses mostly have a defined strategy for fighting customer churn over a period of time. Organizations are able to determine their success rate in customer loyalty and identify improvement strategies using available data and learning about churn.
- North America > Canada > Quebec > Montreal (0.04)
- North America > Montserrat (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Banking & Finance > Insurance (1.00)
- Marketing (0.87)
Information asymmetry in KL-regularized RL
Galashov, Alexandre, Jayakumar, Siddhant M., Hasenclever, Leonard, Tirumala, Dhruva, Schwarz, Jonathan, Desjardins, Guillaume, Czarnecki, Wojciech M., Teh, Yee Whye, Pascanu, Razvan, Heess, Nicolas
Many real world tasks exhibit rich structure that is repeated across different parts of the state space or in time. In this work we study the possibility of leveraging such repeated structure to speed up and regularize learning. We start from the KL regularized expected reward objective which introduces an additional component, a default policy. Instead of relying on a fixed default policy, we learn it from data. But crucially, we restrict the amount of information the default policy receives, forcing it to learn reusable behaviours that help the policy learn faster. We formalize this strategy and discuss connections to information bottleneck approaches and to the variational EM algorithm. We present empirical results in both discrete and continuous action domains and demonstrate that, for certain tasks, learning a default policy alongside the policy can significantly speed up and improve learning.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
- (2 more...)
Deep Demosaicing for Edge Implementation
Ramakrishnan, Ramchalam Kinattinkara, Jui, Shangling, Nia, Vahid Patrovi
Most digital cameras use sensors coated with a Color Filter Array (CFA) to capture channel components at every pixel location, resulting in a mosaic image that does not contain pixel values in all channels. Current research on reconstructing these missing channels, also known as demosaicing, introduces many artifacts, such as zipper effect and false color. Many deep learning demosaicing techniques outperform other classical techniques in reducing the impact of artifacts. However, most of these models tend to be over-parametrized. Consequently, edge implementation of the state-of-the-art deep learning-based demosaicing algorithms on low-end edge devices is a major challenge. We provide an exhaustive search of deep neural network architectures and obtain a pareto front of Color Peak Signal to Noise Ratio (CPSNR) as the performance criterion versus the number of parameters as the model complexity that beats the state-of-the-art. Architectures on the pareto front can then be used to choose the best architecture for a variety of resource constraints. Simple architecture search methods such as exhaustive search and grid search requires some conditions of the loss function to converge to the optimum. We clarify these conditions in a brief theoretical study.
Convex Tensor Decomposition via Structured Schatten Norm Regularization
We propose a new class of structured Schatten norms for tensors that includes two recently proposed norms (overlapped'' and "latent'') for convex-optimization-based tensor decomposition. Based on the properties of the structured Schatten norms, we mathematically analyze the performance of "latent'' approach for tensor decomposition, which was empirically found to perform better than the "overlapped'' approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific mode, this approach performs as well as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structures Schatten norms, which is also interesting in the general context of structured sparsity. We confirm through numerical simulations that our theory can precisely predict the scaling behaviour of the mean squared error. "
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)